Skip to content

Add MLflow logging support#66

Merged
RobotSail merged 15 commits intomainfrom
add-mlflow-support
Feb 5, 2026
Merged

Add MLflow logging support#66
RobotSail merged 15 commits intomainfrom
add-mlflow-support

Conversation

@RobotSail
Copy link
Collaborator

@RobotSail RobotSail commented Jan 27, 2026

Summary

  • Add mlflow_wrapper.py for optional MLflow imports with error handling
  • Add mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name to TrainingArgs
  • Update AsyncStructuredLogger to support MLflow logging
  • Add MLflow CLI args to api_train.py and train.py
  • Initialize MLflow at start, log params, finish at end of training

Test plan

  • Verify training works without MLflow (backward compatibility)
  • Test with MLflow tracking URI configured
  • Confirm metrics appear in MLflow dashboard

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Optional MLflow integration for experiment tracking and logging, with new CLI options to set tracking URI, experiment name, and run name. MLflow runs are initialized, logged (params & metrics), and finalized alongside the existing WandB workflow, and can be enabled per run.
  • Chores

    • Added mlflow to optional dependencies.

@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

Adds optional MLflow logging: new dependency-optional mlflow wrapper, CLI/train args and TrainingArgs fields for MLflow, logger integration to send metrics/params to MLflow, and wiring to initialize/resume and finish MLflow runs during training.

Changes

Cohort / File(s) Summary
MLflow wrapper module
src/mini_trainer/mlflow_wrapper.py
New module (≈173 lines). Provides dependency-optional MLflow integration with MLFLOW_AVAILABLE flag, MLflowNotAvailableError, check_mlflow_available(), init(), log_params(), log(), finish(), get_active_run_id(), and run-id persistence via a global _active_run_id.
Logger integration
src/mini_trainer/async_structured_logger.py
AsyncStructuredLogger.init gained use_mlflow: bool = False. Imports mlflow_wrapper and check_mlflow_available. log() now calls mlflow_wrapper.log(...) when enabled and on main rank, alongside existing WandB logic.
Training pipeline & CLI
src/mini_trainer/train.py, src/mini_trainer/api_train.py
train()/main() extended with MLflow CLI options: mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name and use_mlflow propagation. train entrypoints initialize mlflow_wrapper.init(...), call log_params(), and call finish() on completion. api_train.py appends MLflow CLI args to assembled training command when provided.
Config types & deps
src/mini_trainer/training_types.py, pyproject.toml
Added TrainingArgs fields: mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name. Added optional dependency mlflow to pyproject.toml.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI/main()
    participant Train as train()
    participant Logger as AsyncStructuredLogger
    participant Wrapper as mlflow_wrapper
    participant MLflow as MLflow Backend

    CLI->>Wrapper: init(tracking_uri, experiment_name, run_name)
    Wrapper->>MLflow: start_run()/resume_run()
    Wrapper->>Wrapper: store _active_run_id
    Wrapper->>MLflow: log_params(params)

    CLI->>Train: start training (use_mlflow=True)
    Train->>Logger: __init__(..., use_mlflow=True)
    Logger->>Wrapper: check_mlflow_available()

    Train->>Logger: log(metrics, step)
    Logger->>Wrapper: log(data, step)
    Wrapper->>Wrapper: _ensure_run_for_logging()
    Wrapper->>MLflow: log_metrics(data, step)

    Train->>Wrapper: finish()
    Wrapper->>MLflow: end_run()
    Wrapper->>Wrapper: clear _active_run_id
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I hopped in code with eager paws,

I wrapped the runs and set the laws,
Metrics bloom where runs resume,
A carrot trail through logs and plume,
Hooray — two backends share the cause!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add MLflow logging support' accurately and concisely summarizes the main objective of the changeset, which is to introduce MLflow integration as a logging backend alongside the existing WandB support.
Docstring Coverage ✅ Passed Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add-mlflow-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/mini_trainer/train.py`:
- Around line 1235-1238: The MLflow enablement check currently only sets
use_mlflow based on mlflow_tracking_uri; change it so use_mlflow is true if any
MLflow-related arg is provided (e.g., mlflow_tracking_uri,
mlflow_experiment_name, mlflow_run_name) so passing --mlflow-experiment-name or
--mlflow-run-name enables MLflow even when tracking URI is omitted; update the
initialization of use_mlflow (the variable referenced in this file) to check all
MLflow args and keep use_wandb logic unchanged.
- Around line 1304-1313: The MLflow init and log_params block (inside the
use_mlflow branch calling mlflow_wrapper.init and mlflow_wrapper.log_params,
currently gated by local_rank == 0 via log_rank_0) must be executed only on the
global rank 0 process to avoid creating one MLflow run per node; change the
guard so that the mlflow_wrapper.init and mlflow_wrapper.log_params calls run
only when the global process rank is 0 (e.g., use torch.distributed.get_rank()
== 0 or an existing global_rank variable) and keep
AsyncStructuredLogger/log_rank_0 behavior unchanged for other logs.
🧹 Nitpick comments (2)
src/mini_trainer/async_structured_logger.py (1)

77-87: Guard rank checks when the process group isn’t initialized.

dist.get_rank() will throw if torch.distributed isn’t initialized. A small guard keeps MLflow logging safe in single-process runs and aligns with log_sync.

♻️ Suggested tweak
-            if self.use_wandb and dist.get_rank() == 0:
+            is_rank0 = not dist.is_initialized() or dist.get_rank() == 0
+            if self.use_wandb and is_rank0:
                 wandb_data = {k: v for k, v in data.items() if k != "timestamp"}
                 wandb_wrapper.log(wandb_data)

             # log to mlflow if enabled, only on MAIN rank
-            if self.use_mlflow and dist.get_rank() == 0:
+            if self.use_mlflow and is_rank0:
                 step = data.get("step")
                 mlflow_data = {k: v for k, v in data.items() if k != "timestamp"}
                 mlflow_wrapper.log(mlflow_data, step=step)
src/mini_trainer/mlflow_wrapper.py (1)

11-17: Declare MLflow as an optional dependency in pyproject.toml.

MLflow is already treated as optional in the code (try/except import pattern), but it's not declared in [project.optional-dependencies]. Add it alongside the existing wandb entry to make the feature discoverable and enable users to install it with pip install rhai-innovation-mini-trainer[mlflow].

@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 39.13043% with 56 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mini_trainer/mlflow_wrapper.py 32.20% 40 Missing ⚠️
src/mini_trainer/async_structured_logger.py 56.25% 7 Missing ⚠️
src/mini_trainer/api_train.py 16.66% 5 Missing ⚠️
src/mini_trainer/train.py 50.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 33-78: log_params() and log() re-enter an already active MLflow
run using mlflow.start_run(run_id=_active_run_id) which raises unless
nested=True is set; update both functions (log_params and log) so their with
mlflow.start_run(...) calls include nested=True (i.e., with
mlflow.start_run(run_id=_active_run_id, nested=True):) to allow re-entrancy into
the existing run when _active_run_id is set.
- Around line 81-128: log_params and log unconditionally call
mlflow.start_run(run_id=_active_run_id) which raises if a run is already active;
modify both functions (log_params, log) to first call mlflow.active_run() and
only invoke mlflow.start_run(run_id=_active_run_id) when active_run is None or
active_run.info.run_id != _active_run_id, otherwise skip starting a run and
directly call mlflow.log_params / mlflow.log_metrics; reference _active_run_id,
mlflow.active_run(), mlflow.start_run(), mlflow.log_params, and
mlflow.log_metrics to locate where to add the conditional guard.

Comment on lines +33 to +78
def check_mlflow_available(operation: str) -> None:
"""Check if mlflow is available, raise error if not."""
if not MLFLOW_AVAILABLE:
error_msg = (
f"Attempted to {operation} but mlflow is not installed. "
"Please install mlflow with: pip install mlflow"
)
logger.error(error_msg)
raise MLflowNotAvailableError(error_msg)


def init(
tracking_uri: Optional[str] = None,
experiment_name: Optional[str] = None,
run_name: Optional[str] = None,
**kwargs,
) -> Any:
"""
Initialize an mlflow run. Raises MLflowNotAvailableError if mlflow is not installed.

Args:
tracking_uri: MLflow tracking server URI (e.g., "http://localhost:5000")
experiment_name: Name of the experiment
run_name: Name of the run
**kwargs: Additional arguments to pass to mlflow.start_run

Returns:
mlflow.ActiveRun object if successful

Raises:
MLflowNotAvailableError: If mlflow is not installed
"""
global _active_run_id
check_mlflow_available("initialize mlflow")
if tracking_uri:
mlflow.set_tracking_uri(tracking_uri)
if experiment_name:
mlflow.set_experiment(experiment_name)
run = mlflow.start_run(run_name=run_name, **kwargs)
_active_run_id = run.info.run_id
return run


def get_active_run_id() -> Optional[str]:
"""Get the active run ID that was started by init()."""
return _active_run_id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -name "mlflow_wrapper.py" -type f

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 119


🏁 Script executed:

cat -n src/mini_trainer/mlflow_wrapper.py | head -150

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 5266


🌐 Web query:

MLflow start_run nested calls re-entrance active run behavior documentation

💡 Result:

  • Default (non-nested) behavior: If there’s already an active run in the current thread, calling mlflow.start_run() again raises an exception (“Run with UUID … is already active … end_run() … or start_run(nested=True)”). This is enforced by checking the thread-local active run stack and requiring nested=True to start another run while one is active. [1]

  • Nested runs (nested=True): mlflow.start_run(nested=True) creates a child run under the current active run (the parent is taken from the top of the active-run stack) and pushes the new run onto the stack; child/parent linkage is recorded via the mlflow.parentRunId tag. [1], [2]

  • Explicit parent (parent_run_id=..., newer API): You can start a run nested under a specific parent via start_run(parent_run_id=...). The parent must be ACTIVE, and if there’s already an active run, the provided parent_run_id must match the current active run’s ID (otherwise it errors). [1]

  • “Active run” semantics (re-entrance / stack):

    • mlflow.active_run() returns the top of a thread-local stack (the most recently started, not-yet-ended run in the current thread). [2]
    • mlflow.end_run() pops the top run from that stack and terminates it. [3]
    • When using with mlflow.start_run() as run: ..., the context manager’s __exit__ ends the run only if it’s still present in the active-run stack, checking by run ID (not object identity) to handle cases where tools stop and restart a run with the same ID (a practical “re-entrance” safeguard). [4]

Sources:
[1] MLflow latest mlflow.tracking.fluent.start_run implementation/docs (module source) (mlflow.org)
[2] MLflow latest mlflow.tracking.fluent.active_run docs/implementation (mlflow.org)
[3] MLflow latest mlflow.tracking.fluent.end_run docs/implementation (mlflow.org)
[4] MLflow latest ActiveRun.__exit__ behavior (module source) (mlflow.org)

Citations:


Add nested=True to start_run() calls in log_params() and log() functions

The log_params() (line 96) and log() (line 125) functions attempt to re-enter the active run via mlflow.start_run(run_id=_active_run_id), but MLflow raises an exception when start_run() is called without nested=True while a run is already active. This breaks the normal usage flow: init()log_params()/log()finish().

Pass nested=True to both start_run() calls to explicitly allow re-entrance into the same run:

Fix
# In log_params() - line 96
if _active_run_id:
    with mlflow.start_run(run_id=_active_run_id, nested=True):
        mlflow.log_params(str_params)

# In log() - line 125  
if _active_run_id:
    with mlflow.start_run(run_id=_active_run_id, nested=True):
        mlflow.log_metrics(metrics, step=step)
🤖 Prompt for AI Agents
In `@src/mini_trainer/mlflow_wrapper.py` around lines 33 - 78, log_params() and
log() re-enter an already active MLflow run using
mlflow.start_run(run_id=_active_run_id) which raises unless nested=True is set;
update both functions (log_params and log) so their with mlflow.start_run(...)
calls include nested=True (i.e., with mlflow.start_run(run_id=_active_run_id,
nested=True):) to allow re-entrancy into the existing run when _active_run_id is
set.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 96-110: The log_params function currently assumes an active mlflow
run; update mlflow_wrapper.log_params to mirror the async-safe pattern used in
log(): after calling check_mlflow_available("log params to mlflow"), detect
mlflow.active_run() and if it's None use the module-level fallback
_active_run_id to call mlflow.start_run(run_id=_active_run_id) (or use
mlflow.log_params with the run context started) so parameters are logged against
the correct run when thread-local context is lost; ensure you still convert
params to strings before logging and maintain the same error behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 110-119: The current logging uses
mlflow.start_run(run_id=_active_run_id) which as a context manager will call
mlflow.end_run() and finish the run; instead, when active_run is None but
_active_run_id is set, use MlflowClient to log directly by run id (e.g., replace
the with mlflow.start_run(...) blocks around mlflow.log_params and the similar
block for metrics) by calling MlflowClient().log_params(run_id=_active_run_id,
params=str_params) and MlflowClient().log_metric(s)/log_batch for metrics, so
the stored run (_active_run_id) is not ended prematurely and future logs
continue to target the same run.
🧹 Nitpick comments (1)
src/mini_trainer/mlflow_wrapper.py (1)

45-88: Add a guard to reuse an existing active run and make init() idempotent.
When a run is already active, mlflow.start_run() raises an exception. Checking for an existing run and reusing it prevents hard failures and handles the common case where init() is called multiple times.

♻️ Suggested guard for existing active runs
-    run = mlflow.start_run(run_name=run_name, **kwargs)
-    _active_run_id = run.info.run_id
-    return run
+    active_run = mlflow.active_run()
+    if active_run and not kwargs.get("nested") and not kwargs.get("parent_run_id"):
+        _active_run_id = active_run.info.run_id
+        return active_run
+
+    run = mlflow.start_run(run_name=run_name, **kwargs)
+    _active_run_id = run.info.run_id
+    return run

The guard preserves nesting semantics: if the caller explicitly requests a nested or child run (nested=True or parent_run_id), a new run is created; otherwise, the existing active run is reused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 45-90: The init function can raise TypeError if callers include
run_name in kwargs as well as the explicit run_name parameter; before calling
mlflow.start_run, pop and prefer the explicit run_name from kwargs (e.g., if
"run_name" in kwargs: kwargs.pop("run_name")) so the explicit run_name argument
takes precedence and no duplicate keyword is passed to mlflow.start_run; update
the call site (mlflow.start_run(run_name=run_name, **kwargs)) in init to use the
sanitized kwargs.
🧹 Nitpick comments (1)
src/mini_trainer/mlflow_wrapper.py (1)

98-125: DRY the active-run resolution for params/metrics.

The active-run branching is duplicated in two places. A small helper reduces drift risk while preserving behavior.

♻️ Optional refactor
+def _ensure_run_for_logging() -> None:
+    active_run = mlflow.active_run()
+    if not active_run and _active_run_id:
+        # No active run in this thread but we have a stored run ID
+        mlflow.start_run(run_id=_active_run_id)
+
 def log_params(params: Dict[str, Any]) -> None:
@@
-    # Check if there's an active run (same pattern as log() for async safety)
-    active_run = mlflow.active_run()
-    if active_run:
-        # Run is active, log directly
-        mlflow.log_params(str_params)
-    elif _active_run_id:
-        # No active run in this thread but we have a stored run ID - resume it
-        # This can happen in async contexts where thread-local context is lost
-        # Note: We don't use context manager here because it would end the run on exit
-        mlflow.start_run(run_id=_active_run_id)
-        mlflow.log_params(str_params)
-    else:
-        # No run context at all - log anyway (MLflow will create a run)
-        mlflow.log_params(str_params)
+    _ensure_run_for_logging()
+    mlflow.log_params(str_params)

 def log(data: Dict[str, Any], step: Optional[int] = None) -> None:
@@
     if metrics:
-        # Check if there's an active run
-        active_run = mlflow.active_run()
-        if active_run:
-            # Run is active, log directly
-            mlflow.log_metrics(metrics, step=step)
-        elif _active_run_id:
-            # No active run in this thread but we have a stored run ID - resume it
-            # This can happen in async contexts where thread-local context is lost
-            # Note: We don't use context manager here because it would end the run on exit
-            mlflow.start_run(run_id=_active_run_id)
-            mlflow.log_metrics(metrics, step=step)
-        else:
-            # No run context at all - log anyway (MLflow will create a run)
-            mlflow.log_metrics(metrics, step=step)
+        _ensure_run_for_logging()
+        mlflow.log_metrics(metrics, step=step)

Also applies to: 128-161

RobotSail and others added 13 commits February 3, 2026 22:21
- Add mlflow_wrapper.py for optional MLflow imports with error handling
- Add mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name to TrainingArgs
- Update AsyncStructuredLogger to support MLflow logging
- Add MLflow CLI args to api_train.py and train.py
- Initialize MLflow at start, log params, finish at end of training

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Enable MLflow when any MLflow arg is provided (not just tracking_uri)
- Only init/finish MLflow on global rank 0 to avoid multiple runs in multi-node

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Store the run ID when initializing MLflow and use it explicitly when
logging params/metrics. This fixes an issue where async logging would
lose the thread-local run context and create a separate MLflow run.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous fix incorrectly tried to start a run in log_params() when
the run was already active from init(). Now log_params() logs directly
since the run is already active, and log() only resumes the run if
it's not currently active (for async contexts).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Guard dist.get_rank() when process group isn't initialized in async log()
- Add mlflow as optional dependency in pyproject.toml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement kwarg > env var precedence for mlflow_tracking_uri and
mlflow_experiment_name, matching the behavior of instructlab-training.

The configuration now follows this precedence:
1. Explicit kwargs (highest priority)
2. Environment variables (MLFLOW_TRACKING_URI, MLFLOW_EXPERIMENT_NAME)
3. MLflow defaults (lowest priority)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mirror the pattern used in log() to handle cases where thread-local
MLflow context is lost in async contexts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Using `with mlflow.start_run(run_id=...)` as a context manager ends the
run when the block exits, breaking subsequent logging calls. Changed to
call start_run() without context manager to keep the run active.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/mini_trainer/wandb_wrapper.py (1)

27-35: ⚠️ Potential issue | 🟡 Minor

Install hint should match the actual distribution name.

Line 32 references mini-trainer[wandb], but pyproject.toml defines the distribution name as rhai-innovation-mini-trainer. Users following this hint will fail to install the package. Update to the correct distribution name:

🔧 Suggested update
-            "Please install wandb with: pip install 'mini-trainer[wandb]'"
+            "Please install wandb with: pip install 'rhai-innovation-mini-trainer[wandb]'"
🤖 Fix all issues with AI agents
In `@src/mini_trainer/api_train.py`:
- Around line 157-165: The MLflow experiment/run name flags are currently only
added when train_args.mlflow_tracking_uri is set; update the logic in
src/mini_trainer/api_train.py so that the appending of --mlflow-experiment-name
and --mlflow-run-name is done independently of the --mlflow-tracking-uri block:
keep the existing append of
f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}" inside the if
train_args.mlflow_tracking_uri: block, but move the if
train_args.mlflow_experiment_name: and if train_args.mlflow_run_name: checks out
of that block so they append their flags to the command list even when
mlflow_tracking_uri is None (consistent with use_mlflow = any([...])).

In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 28-42: The error message in check_mlflow_available currently
suggests installing "mini-trainer[mlflow]" which doesn't match the package name;
update the error_msg string used in check_mlflow_available (and raised via
MLflowNotAvailableError) to instruct installing
"rhai-innovation-mini-trainer[mlflow]" instead, keeping the same phrasing and
logger.error call and ensuring MLFLOW_AVAILABLE and MLflowNotAvailableError
usage remain unchanged.
- Around line 45-99: The current init function unconditionally does
kwargs.pop("run_name", None), which discards a caller-supplied run name passed
via **kwargs; change the logic so that if the explicit parameter run_name is
None and "run_name" exists in kwargs, promote it by assigning run_name =
kwargs.pop("run_name"), otherwise (when run_name is explicitly provided) remove
"run_name" from kwargs to avoid duplicate arguments before calling
mlflow.start_run; update references around init, run_name, kwargs.pop and
mlflow.start_run accordingly.
🧹 Nitpick comments (2)
src/mini_trainer/async_structured_logger.py (1)

78-91: Avoid logging step twice to MLflow.

step is already supplied as the step argument; leaving it in mlflow_data records a redundant step metric. Consider filtering it out.

♻️ Suggested tweak
-                mlflow_data = {k: v for k, v in data.items() if k != "timestamp"}
+                mlflow_data = {
+                    k: v for k, v in data.items() if k not in ("timestamp", "step")
+                }
pyproject.toml (1)

65-65: MLflow dependency needs a minimum version constraint for Python 3.12 compatibility.

Current MLflow releases (3.x series) fully support Python 3.11–3.12. However, the declaration mlflow = ["mlflow"] lacks a version constraint and could allow installation of older MLflow 2.x versions, which had Python 3.12 compatibility issues. Add a minimum version constraint such as mlflow = ["mlflow>=3.0"] to ensure users always get a compatible release.

Comment on lines +157 to +165
# mlflow-related arguments
if train_args.mlflow_tracking_uri:
command.append(f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}")
if train_args.mlflow_experiment_name:
command.append(
f"--mlflow-experiment-name={train_args.mlflow_experiment_name}"
)
if train_args.mlflow_run_name:
command.append(f"--mlflow-run-name={train_args.mlflow_run_name}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/mini_trainer/api_train.py | sed -n '150,175p'

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 1447


🏁 Script executed:

# Get broader context around the MLflow configuration
cat -n src/mini_trainer/api_train.py | sed -n '140,180p'

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 2126


🏁 Script executed:

# Check if there are other MLflow references or documentation in the repo
rg -i "mlflow" --type py -A 2 -B 2 src/mini_trainer/api_train.py | head -80

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 628


🏁 Script executed:

# Find where the MLflow CLI arguments are parsed/handled
rg "mlflow-tracking-uri|mlflow-experiment-name|mlflow-run-name" --type py -B 3 -A 3

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 988


🏁 Script executed:

# Check the training script that receives these arguments
fd "train\.py" src/ -type f

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 253


🏁 Script executed:

# Look for MLflow initialization or argument parsing in the codebase
rg -i "mlflow" --type py -B 2 -A 2 | head -100

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 6208


🏁 Script executed:

cat src/mini_trainer/mlflow_wrapper.py

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 5599


MLflow experiment and run name args should be independent of tracking URI.

Lines 160–165 only append --mlflow-experiment-name and --mlflow-run-name when mlflow_tracking_uri is set. However, the training code enables MLflow if any of the three MLflow args are provided (use_mlflow = any([mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name])), and MLflow supports using the default local backend without an explicit tracking URI. Move these two conditionals outside the mlflow_tracking_uri block so they append independently.

🛠️ Suggested fix
-    if train_args.mlflow_tracking_uri:
-        command.append(f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}")
-        if train_args.mlflow_experiment_name:
-            command.append(
-                f"--mlflow-experiment-name={train_args.mlflow_experiment_name}"
-            )
-        if train_args.mlflow_run_name:
-            command.append(f"--mlflow-run-name={train_args.mlflow_run_name}")
+    if train_args.mlflow_tracking_uri:
+        command.append(f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}")
+    if train_args.mlflow_experiment_name:
+        command.append(
+            f"--mlflow-experiment-name={train_args.mlflow_experiment_name}"
+        )
+    if train_args.mlflow_run_name:
+        command.append(f"--mlflow-run-name={train_args.mlflow_run_name}")
🤖 Prompt for AI Agents
In `@src/mini_trainer/api_train.py` around lines 157 - 165, The MLflow
experiment/run name flags are currently only added when
train_args.mlflow_tracking_uri is set; update the logic in
src/mini_trainer/api_train.py so that the appending of --mlflow-experiment-name
and --mlflow-run-name is done independently of the --mlflow-tracking-uri block:
keep the existing append of
f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}" inside the if
train_args.mlflow_tracking_uri: block, but move the if
train_args.mlflow_experiment_name: and if train_args.mlflow_run_name: checks out
of that block so they append their flags to the command list even when
mlflow_tracking_uri is None (consistent with use_mlflow = any([...])).

Comment on lines +28 to +42
class MLflowNotAvailableError(ImportError):
"""Raised when mlflow functions are called but mlflow is not installed."""

pass


def check_mlflow_available(operation: str) -> None:
"""Check if mlflow is available, raise error if not."""
if not MLFLOW_AVAILABLE:
error_msg = (
f"Attempted to {operation} but mlflow is not installed. "
"Please install mlflow with: pip install 'mini-trainer[mlflow]'"
)
logger.error(error_msg)
raise MLflowNotAvailableError(error_msg)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "pyproject.toml" -type f | head -5

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 99


🏁 Script executed:

cat pyproject.toml

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 2785


🏁 Script executed:

sed -n '28,42p' src/mini_trainer/mlflow_wrapper.py

Repository: Red-Hat-AI-Innovation-Team/mini_trainer

Length of output: 618


Align MLflow install hint with distribution name.

Line 39 references mini-trainer[mlflow], but the distribution name in pyproject.toml is rhai-innovation-mini-trainer. Update the error message to match.

🔧 Suggested update
-            "Please install mlflow with: pip install 'mini-trainer[mlflow]'"
+            "Please install mlflow with: pip install 'rhai-innovation-mini-trainer[mlflow]'"
🤖 Prompt for AI Agents
In `@src/mini_trainer/mlflow_wrapper.py` around lines 28 - 42, The error message
in check_mlflow_available currently suggests installing "mini-trainer[mlflow]"
which doesn't match the package name; update the error_msg string used in
check_mlflow_available (and raised via MLflowNotAvailableError) to instruct
installing "rhai-innovation-mini-trainer[mlflow]" instead, keeping the same
phrasing and logger.error call and ensuring MLFLOW_AVAILABLE and
MLflowNotAvailableError usage remain unchanged.

Copy link
Contributor

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, all my review comments have been addressed

@RobotSail RobotSail merged commit 19d9ee6 into main Feb 5, 2026
10 of 11 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants